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Abstract 

The output distribution, when rate is above capacity, is investigated. It is shown that there is an 
asymptotic equipartition property (AEP) of the typical output sequences, independently of the specific 
codebook used, as long as the codebook is typical according to the standard random codebook generation. 
This equipartition of the typical output sequences is caused by the mixup of input sequences when there 
are too many of them, namely, when the rate is above capacity. This discovery sheds some light on the 
optimal design of the compress-and-forward relay schemes. 

I. Introduction 

A fundamental observation of Shannon's channel coding theorem is that using a randomly 
generated codebook (i.i.d. generated according to some Po(x)) at a rate below capacity will lead 
to a distribution pattern of the output sequences, by which, a decoding scheme with arbitrarily 
low probability of error can be devised. 

In this paper, we are interested in the case when the rate is above capacity. We will show 
that such a pattern that can be used for decoding will disappear when there are too many input 
sequences, i.e., when the rate is above capacity. Instead, in this case, the output will have an 
asymptotic equipartition property on the set of typical output sequences (typical with respect to 
Po(y) = Y2 x Po( x )p{v\ x ))- Interestingly, this set is independent of the specific codebook used, as 
long as the codebook is typical according to the random codebook generation. The reason for 
this equipartition is that the input sequences are too dense, so that different input sequences can 
contribute to the same output sequence and get mixed up. 

Part of the work [1] was presented at CWIT 2009. 
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Investigating the optimal compress-and-forward relay scheme has motivated this study of output 
distribution when rate is above capacity. The optimality of the compress-and-forward schemes 
is arguably one of the most critical problems in the development of network information theory, 
where ambiguity always arises when decoding cannot be done correctly. In the classical approach 
of [2], the compression scheme at the relay was only based on the distribution used for generating 
the codebook at the source, instead of the specific codebook generated. While many different 
codebooks can be generated according to the same distribution, can the knowledge of the specific 
codebook be helpful? There have been some discussions on this issue (e.g., [3]). Here, in this 
paper, we show that the observations at the relay are somehow independent of the specific 
codebook used at the source, and only depend on the distribution by which the codebook is 
generated. 

To further explore the optimality of the compress-and-forward schemes, we compare the rates 
needed to losslessly compress the relay's observation in two different scenarios: i) the relay uses 
the knowledge of the source's codebook to do the compression; ii) the relay simply ignores this 
knowledge. It is shown that the minimum required rates in both scenarios are the same when the 
rate of the source's codebook is above the capacity of the source-to-relay link. 

The remainder of the paper is organized as the following. In Section HH we first introduce 
some standard definitions of strongly typical sequences, and then give a definition of typical 
codebooks. Then, we summarize our main results in Section Unl followed by the proof of these 
results in Section [IV] [V] and [VI] Finally, as an application of the results, the optimality of the 
compress-and-forward schemes is discussed in Section IVIIl 

II. Preliminaries 

Consider a discrete memoryless channel (X,p(y\x),y) with capacity C := maXp^) I(X; Y). 
Under the random coding framework, a random codebook C with respect to po(x) with rate R 
and block length n is defined as 

C:={l»er,t» = l r .,2" fi }, (1) 

where each codeword in C is an i.i.d. random sequence generated according to a fixed input 
distribution po(x). 

It is well known that information can be transmitted with arbitrarily small probability of error 
for sufficiently large n if R < C. In this paper, however, we are interested in the case where the 
rate is above capacity. 
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A. Strong Typicality 

We begin with some standard definitions on strong typicality [3, Ch.13]. 
Definition 2.1: The e-strongly typical set with respect to po(x), denoted by A^q(X), is the set 
of sequences x n £ X n satisfying: 

1. For all a £ X with po(a) > 0, 

-N(a\x n ) -po(a) 

71/ 

2. For all a £ X with p (a) = 0, N(a\x n ) = 0. 
N(a\x n ) is the number of occurrences of a in x n . 
Similarly, we can define the e-strongly typical set with respect to po(y) and denote it by 

Definition 2.2: The e-strongly typical set with respect to p (x,y), denoted by A^(X,Y), is 
the set of sequences (x n ,y n ) £ X n x y n satisfying: 

1. For all (a, b) £ X x y with p (a, b) > 0, 

^N(a,b\x n ,y n )- Po (a,b) 

2. For all (a,b) e X x y with p (a, 6) = 0, 

N{a,b\x n ,y n ) = 0. 

N(a, b\x n , y n ) is the number of occurrences of the pair (a, b) in the pair of sequences (x n , y n ). 

Definition 2.3: The e-strongly conditionally typical set with the sequence x n with respect to 
the conditional distribution p(y\x), denoted by A ( 'J l \Y\x n ), is the set of sequences y n £ y n 
satisfying: 

1. For all (a, b) £ X x y with p(b\a) > 0, 

- \N(a,b\x n ,y n ) -p(b\a)N(a\x n )\ < e(l + -j-), (2) 

2. For all (a, 6) £ X x y with p(6|a) = 0, 

N(a,b\x n ,y n ) = 0. (3) 




e 
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B. Typical Codebooks 

Definition 2.4: For the discrete memoryless channel (X,p(y\x), y), the channel noise is said 
to be e-typical if for any given input x n , the output Y n is e-strongly conditionally typical with 
x n with respect to the channel transition function p(y\x), i.e., X . 

Due to the Law of Large Numbers, the channel noise is "typical" with high probability. 

Index the sequences in A^(Y) as y™ (i),i = 1, . . . , M\ q , where = \A^(Y)\. Consider 
the set F etQ (i) C X n , where each sequence in F £jQ (i) is strongly typical and can reach y™ (i) 
over a channel with typical noise, i.e., 

F e , (z) := {*" G ^(X) : V^S) e 4° OH*")} . 
The following notation is useful for defining the typical codebooks. 

P e , (i) := Pr(X™ G F e , (z)|X" G ^(X)), 

iV e , (i|C) :=^I(x n H GF e>0 (z)), 

/ !« = l 

where X n is drawn i.i.d. according to po(x) and 1(A) is the indicator function: 

f 1 if A holds, 
1(A) = \ 

I otherwise. 

Definition 2.5: A codebook 

C = {x n {w) EX n ,w = l,...,2 nR } 
is said to be e-typical with respect to po(x) if 

1) x n (w) G 45W.Vtw G {l,...,2 n/? }, 

j^oglg) p r .s . n 3 i? 
ie{i,...,M<" »} A z 

III. Main Results 

The main results of this paper are summarized by the following three theorems. Their proofs 
are presented in Sections [IV] |V] and |VI] respectively. The application of these results to the relay 
channel will be discussed in Section IVIII 
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Theorem 3.1: Given that an e-typical codebook C is used and the channel noise is also e- 
typical, thenJ3 

Pr(F" = ylMC) = 2- nH ^ Y \Vi £ {1, . . . , M#}, 

when R > I (X;Y), where both H (Y) and I (X;Y) are calculated according to p (x,y) = 
p (x)p(y\x). 

Throughout this paper, we generate the codebook C at random according to po(x) and reserve 
only the e-strongly typical codewords. Then we have Theorem 13.21 and [3.31 
Theorem 3.2: For any e > 0, 

Pr(C is e-typical) — > 1 as n — > oo. (4) 
Theorem 3.3: Consider the conditional entropy of the channel output given the source's code- 
book information, namely H(Y n \C). We have 

Urn Itf(nc) = H Y) whe„«> W ), 
[R + H (Y\X) when R < I (X;Y), 



n— >oo 77, 



where H (Y), I (X;Y) and H (Y\X) are all calculated according to po(x,y) = p (x)p(y\x). 
In contrast, without the codebook information, we have 

lim -H{Y n ) = H (Y) for any R > 0. 

n— >oo fi 

IV. AEP of Typical Output Sequences 

Essentially, Theorem 13.11 states that there exists an asymptotic equipartition property of the 
typical output sequences, irrespective of the specific codebook used, as long as the codebook is 
a typical codebook. To prove this theorem, we first introduce two lemmas. 

Lemma 4.1: Let E e denote the event that the output Y n 6 A^(Y\x n ) for any given input x n . 

For any x n £ F £j0 (i), 

Pr(Y" = y^ (i)\E e ,X n = x n ) > 2-™(^( y l x )+ e ») 
and Pr(Y n = yl Q (i)\E e ,X n = x n ) < 2~ n ^ Y \ x ^°\ 

where H (Y\X) is calculated according to p (x,y) = Po{x)p{y\x) and e goes to as e — > 
and n — > oo. 

'Same as the notation in [4], we say a n = b n if lim n -»oo ^ log 2a. = q. ">" and "<" have similar interpretations. 



6 



Proof: By the definition of F e> o(i), we have for any x n in F e> o(i), x n G A^Jq(X) and 
Vefii^) ^ \Y\x n ). Then, it follows from the definition of strong typicality that (x n , y™ 
A^ Q (X,Y), where e' — > as e — > 0. Since strong typicality implies weak typicality, for any x n 
in F efi (i), we have 



-logp(x n )-H (X) 



n 



-\ogp(x n ,y: fi (t))-H (X,Y) 



where e" — > as e — > 0. Thus, 



-togp^oWk")-^^) 

77/ 



<6", 



<e", 



< 2e", 



and 



Therefore, for any x n G F e ^(i), we have 

Pr(y» = y» o (0|S e ,X» = x B ) 
Pr(y" = j/ e Vi), J E? e ,X n = x") 



Pr(£ e ,X™ = :r n ) 
Pr(F™ = y£ (i),X n = x n ) 
~Pr(E e \X n = x n )Pr(X n = x n ) 
p(y^ (i)\x n ) 



Pr(E e \X n = x n ) 
=(l + O (l))p(y^ (0|x B ) 
<(l+o(l))2^«™- 2£ ") 

=2 -n(^o(V|X)- eo )^ 

where e := 2e" + 1 °s( 1 +°( 1 )) an( j eQ _> q as e — > and ra — > oo. Similarly, for any x n G F £j0 (i), 
we have 

Pr(r n = ^ (*)|£ e ,X n = x") 
=(l + O (l))p(y^ (0|x B ) 

>2 -n(ffo(y|X)+2e"- ' og(1 + o(1)) ) 
>2 -n(//o(y|X)+e„) ) 
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which finishes the proof of Lemma 14. 1[ ■ 
Lemma 4.2: If C is a typical codebook, then for any i G {1, . . . , M^j }, 

W e ,o(*|C) > 2 ni? • 2- n ( /o ( x - y ) +e o) - n 3 R 
and iV e , (i|C) < 2 niJ • 2~ niMX ' Y) -^ + n 3 P, 

where Iq{X\ Y) is calculated according to p (x)p(y\x) and 6q goes to as e — > and n — > oo. 

Proof: To prove Lemma |4.2[ we need the following standard result on strong typicality 
(see Lemma 13.6.2 in [4]): 

Let X n be drawn i.i.d. according to po(x) = Y^ y Po(%,y)- For y n G 

Pr((X ft ,y») G A ( $(X,Y)) > 2- n ^ x ' Y)+ ^ (5) 
and Pr((X n } y n ) G < 2 -"( /o ( X;y )- ei ), (6) 

where I (X; Y) is calculated according to p (x,y) and e\ goes to as e — >■ and n — > oo. 
According to the definition of P e> o(i), 

where X n is drawn i.i.d. according to Po(^)- 

Since X n G ^(X) and y£ (i) G 4 n) (F|X") imply that (X™, y£ (i)) G where 
e' goes to as e — ► 0, we have 

P e ,o(i) <Pr((X^ (i)) G A$(X,Y)|X» G A<$(X)) 
_ Pr((X",^ (.)) G 4%(Jr, Y),* w G ^(X)) 

Pr(X™ G (X)) 
<(l + O (l))Pr((X",y e Vz))G^ (X,y)) 

<(1 + o(l))2- n ( /0 ^ ;y )- £ i) 
^ 2 -n(J (X;y)-e' 1 - log t 1 +°C 1 )) ) 

=2 -n(/ (x ; y)-4) ^ 

where e' 2 := e'j + 1 °s( 1 +°( 1 )) anc [ e ' 2 _>. q as e _>. q and n —> oo. 

Furthermore, by the standard definitions of strong typicality, it follows that (x n ,y™ (i)) G 
(X, Y) implies x n G A^(X). Now, we show (x n , y? fi (i)) G A§j(X, Y) also implies y» (i) G 
4 n) (Y|x n ). Suppose (x n ,2/" (T)) G A§}(X,Y). Then, we have 



s 



1) For all (a, b) G X x y with p(b\a) = 0, p (a, b) = and N(a, b\x n , y£„(i)) = 0. 

2) For all (a, 6) G X x y with p(6|a) > and p (a) = 0, p (a, b) = and iV(a, 6|x n , j/£ (i)) 
0, as well as N(a\x n ) = 0. 

3) For all (a,b) <E X x y with p(b\a) > and p (a) > 0, po(a, b) > and 



-iV(a|x n ) -po(a) 



n 



< 



-N(a,b\x n ,yl (i))-p (a,b) 
n 



< 



\x\\y[ 



Thus, 



~N(a,b\x n ,y? a (i)) - -N(a\x n )p(b\a) 
n ' n 

<p {a, b) + * - p(6|o)(po(a) - p^) 



i 



Therefore, (x n , y£ (i)) G A^(X,y) implies that j/£ (z) G Af ; (Y|x n ), as well as x" G A^ j (X). 
Then, we have 

Pe,o(i) =Pr« (^) G G 4; } (X)) 

>Pr((X",^ (z)) G 4J(X,y)|X» G 
_ Pr((X",ff (Q) G Y),X n G Agffl) 

Pr(X™ G A§>(X)) 
= (l + O (l))Pr((l",< (i))G< ) (X ) F)) 

>(l + o(l))2-" (/(,(X;y)+ei) 
=2 -»(Jo(x ; y)+. 1 - " Jg(1 +° (1)) ) 

= 2 -n(/o(X;y)+e 2 ) 

where e 2 := ei — log ^ 1 +° ( ^ 1 ^ and e 2 — > as e — > and n — > oo. 
Let e[, = max{e 2 , e' 2 }. Combining © and ©, we have 

2 -n(I (X;Y)+e' ) < p^/Q < 2 -n(/o(X;r)-e' )_ 



(8) 



(9) 
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Therefore, if C is a typical codebook, by the definition of the typical codebooks and ©, for 

myie{l,...,M$}, 

N e ,o(i\e) > 2 nR ■ 2-™( / o(x ; y)+^) _ n s R 
and N e>0 (i\C) < 2 nR ■ 2~ n ^ x ^-^ + n 3 R, 

where Iq(X; Y) is calculated according to Po(x)p(y\x) and e' goes to as e — ► and n — > oo. 

■ 

Proof: [Proof of Theorem Ol Let E e denote the event Y n G ^ n) (Y|x n ) for any given 
input x n . Consider Pr(F n = y™ Q (i)\E e , C is typical) for any i G {1, . . . , M £ q }. We lower bound 
this probability as follows: 

Pr(y n = ^ (i)|£ e ,C is typical) 
= ^Pr(F n = yl Q {i)\E e) C is typical, X n = x"(w)) 

w=l 

■ Pr(X" = x n (u;)|£ e ,C is typical) (10) 
' ^Pr(y n = y"M)\E e ,C is typical, X n = x n (w)) (11) 





2 nR 




1 




2 ni? 




1 


> 










1 


> 






2 ni? 



Ul = l 



1- Fr(Y n = yZ (i)\E e ,C is typical, = (12) 



i«(u))€F e ,o(i) 



(13) 
(14) 



1 _ 1^ . 2 "('o(W+<£) 



=2 -n(.H (Y)+e +e' ) . 

(TTOl) follows from the Law of Total Probability and accumulates the contributions from all the 
codewords in the codebook to the probability for y™ (i) to be channel output. 
CCD) follows from the uniform distribution of message index W. 

(fT2l) follows from the the condition E € and the fact that C contains only strongly typical 
codewords. 

(TT~3T ) follows from Lemma 14.11 
(fl4l) follows from Lemma 14.21 

Let e — * as n — > oo. Then for any % G {1, . . . , M e q }, 

Pr(F n = y £ n (i)|E e , C is typical) > 2- nHo(Y \ (15) 
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when R > I {X;Y). 

Similarly, following (fT2j) . by Lemmas 14.11 and 14.21 we have 



Fr(Y n = y$ (i)\E e ,Cis typical) 
1 

■jr. 

< — (2 nR ■ 2- n (MX;Y)-e' ) , 3 m . 2 -n(H (Y\X)-e ) 
— 2^R ^ ' 



; 2-n(Ho(r)-e -£(,) . 



1 + 1^ . 2 "(- f o(X;^)-6i) 



Therefore, for any i G {1, . . . , M^}, 

Pr(F n = ^ (i)|E ej C is typical) < 2- nH ° (y) , (16) 
when R > I (X; Y). Combining CS and £[6]), we establish Theorem [3711 ■ 

V. The Probability that A Typical Codebook Appears 

In this section, we will show that with high probability, a typical codebook will be generated 
by the random codebook generation. We begin with some relevant definitions and the Vapnik- 
Chervonenkis Theorem [5], [6]: 

A Range Space is a pair (X, JF), where X is a set and T is a family of subsets of X. For 
any A C X, we define P?(A), the projection of T on A, as {F n A : F G J 7 }. We say that A 
is shattered by JF if Pjr(A) = 2 A , i.e., if the projection of T on A includes all possible subsets 
of A. The VC-dimension of T, denoted by VC-d(jF) is the cardinality of the largest set A that 
T shatters. If arbitrarily large finite sets are shattered, the VC dimension of T is infinite. 

The Vapnik-Chervonenkis Theorem: If T is a set of finite VC-dimension and {Yj} is a sequence 
of n i.i.d. random variables with common probability distribution P, then for every e, 5 > 



Pi 

whenever 



i sup - V I(Y} G F) - P(F) < e I > 1 - 5 (17) 

/8VC-d(^). 16e 4. 2\ 
«>max log 2 — , - log 2 - V . (18) 

Let jF e = {F e fl{i) ) i = 1, . . . , M e g }. To show Theorem 13.21 a finite VC dimension of jF e 
is desired in order to employ the Vapnik-Chervonenkis Theorem. For this reason, we introduce 
Lemma 15.11 



11 



Lemma 5.1: For a fixed block length n, VC-d(jF e ) < n(H (Y) +e'), where e' — > as e — > 0. 
Proof: By the Asymptotic Equipartition Property, |jF ej0 1 = Mg ^ < 2 ra (- H °( y ) +e '), where e' — > 
as e — > 0. Thus, for any A C 

|{F e , (i) n A : F e , (i) e T tfl }\ < 2 n (*W +e '\ 

and hence VC-d(^ ei0 ) < ra(if 00 + e')- ■ 



Proof: [Proof of Theorem 13.211 Since we reserve only the e-strongly typical codewords 
when generating the codebook, for any random codebook, the first condition in Definition 12.51 
is obviously satisfied. Below, we focus on showing that a random codebook satisfies the second 
condition in Definition 12.51 with high probability. 

For the given po(x), consider all the codewords in a random codebook, X n (w), w — 1, . . . , 2 nR . 
They are generated with the common distribution p(x n ) = Pr(X n = x n \X n e A$j(X)), where 
X n is drawn i.i.d. according to Po(x). Since VC-d(jF e ) is finite for a fixed n, we employ the 
Vapnik-Chervonenkis Theorem under the range space (X n , F^o). To satisfy (fT8l) , let both e and 5 
in (fTTT ) be where A e := max{8VC-d(.F e) o), 16e}. Then the Vapnik-Chervonenkis Theorem 

states that 

A f nR 



Pr sup 

A f nR 



< 



■>nR 



>1 



1 as n — > oo, (19) 



where JV e ,„(i|C) = J2 w =AX n (w) e F e ,„(i)). Since |J > %Br for sufficiently large n, (TH 
concludes the proof of Theorem 13.21 ■ 

VI. Proof of Theorem [33] 

Before proceeding to the proof of Theorem 13 .31 we first introduce Lemma 16.11 which will 
facilitate the later discussions. The proof of Lemma [67T1 is given in Appendix H 

Lemma 6.1: For the channel (X,p(y\x),y), generate the codebook at random according to 
Po(x) and reserve only the e-strongly typical codewords. The channel input and output X n and 
Y n satisfy that 

1) Pr((X n , Y n ) e A { $(X, Y)) -> 1 as n -> oo, for any e > 0; 



12 



2) linw^tfpf") = H (X), lim^oo \H{Y n ) = H (Y), and lim n ^ 00 ^H(X n ,Y n ) = 
H (X,Y). 

Remark 6.1: Since we reserve only the e-typical codewords when generating the codebook, 
generally, the channel input X n is no longer an i.i.d. random process. However, Lemma 16.11 
essentially states that the random process (X n , Y n ) still satisfies the joint asymptotic equipartition 
property and furthermore, the entropy rates of the random processes X n , Y n and (X n , Y n ) can 
still be simply expressed in the single letter form respectively. This observation will facilitate 
our later discussions. 

Proof: [Proof of Theorem I3.3H We prove Theorem 13 .31 by characterizing Hindoo ^H(Y n \C) 
in two different cases: when R > Iq(X; Y) and when R < Iq(X; Y), respectively. 

A. When R> I (X;Y) 

Define an indicator random variable E as 

E:=I(E 6 ), 

where E e denotes the event Y n e A^\Y\x n ) for any given input x n . 



When R > I (X; Y), we have 
H(Y n \C) 
>H(Y n \E,C) 

=Pr(E = l)H(Y n \E = 1, C) + Pr(£ = 0)H(Y n \E = 0, C) 
>Pr(£ = 1) -H(Y n \E = 1,C) 
= (l-o(l))-H(Y n \E = l,C) 

= (l-o(l))-^p(C)-if(r"|E=l,C = C) 

c 

>(l-o(l))- E p(C) • H(Y n \E = 1, C = C) 

C is typical 

E P(c)-(EpW,c)]og 1 ) 

C is typical \ ^ Vy \ e, J J 



\ 



>(l-o(l))- E P(C)- E P(^"l^,C) log— — |^ 

C is typical \reA^(Y) V ' ° 7 

>(l-o(l))- E p( C )-( E M2/ n |^,C)log2"^( y )"^ 

C is typical \v"e4" V) V 

=n[H Q (Y) - e*] ■ (1 - o(l)) • E p( C ) ' f E P(l/ B I^.C) 

C is typical \re4!o( y ) / 

=n[F (K) - e*] • (l-o(l))- E p(C)-Pr(r»e^(y)|S 6) C) 

C is typical 

=n[F (K) - e*] • (1 - o(l)) • £ p(C\E e ) -Pr(Y n e A^(Y)\E ej C) 

C is typical 

=n[H {Y) - e*] • (1 - o(l)) • Pr(F" e A$(Y), C is typical|£ e ) 
=n[flo(V) - e*] ■ (1 - o(l)) ■ (1 - o(l)) 
=n[H (Y) - e*] • (1 - o(l)) 

(|20l) follows from the fact that conditioning reduces entropy. 
(l2~TT) follows from the fact that Pr(E e ) — > 1 as n — > oo, for any e > 0. 
(1221) follows from Theorem [3TTT which upper bounds p(y n \E e ,C) by 2 _n ^°( y )- 
y n e y4[ n J(F) and typical C, where e* — > as n — > oo. 
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(|231) follows from the fact that 

Pr(Y n G (Y), C is typical|E e ) -> 1 as n -> oo. 
This can be seen from the following. 

Pr(F n G A^(y), C is typical|E e ) 

_Pr(F n G A [ $(Y),E e ,C is typical) 

= Pr(S e ) 
FvjjX^Y-) G Ag(X,F),E C is typical) 

" Pr(^ e ) ' { } 

Since Pr(£ e ), Pr(C is typical) and Pr((X n ,Y n ) G A^(X,Y)) all go to 1, obviously both the 
numerator and denominator of (|24l) go to 1 as n — > oo. Thus, 

Pr(Y n G A^(y), C is typical | £ e ) ^lasn^oo. 

Therefore, when R > I (X;Y), 

liminf-tf \Y n \C) 

n^oo Ti 

>\immf - (n[H (Y) - e*] • (1 - o(l))) 
= liminf[# (Y)-e*]-(l-o(l)) 

n— >oo 

=#oQO- (25) 

Furthermore, 

limsup -#(F n |C) < limsup i#(Y n ) = H (Y), (26) 

n — >oo n — >oo Tl 

where the last equality follows from Lemma 16.11 

Combining ([25]) and ((261), we have that when R > I {X; Y), 

lim -H(Y n \C) = H (Y). 

n^oo n 

B. When R<I (X;Y) 

To find lim^oo ^H(Y n \C) when R < Iq(X; Y), we first introduce two lemmas. The proofs 
of these two lemmas are given at the end of this section. 
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Lemma 6.2: When R < I Q (X; Y), 

1 



Lemma 6.3: 



H(X n \C,Y n ) -»■ 0, as n -»• oo. 
n 



lim -#(X n |C) = i2. 

n-+oo 77, 

Now, expanding H(X n ,Y n \C) in two different ways, we have 

H{X n , Y n \C) =H{X n \C) + H(Y n \X n , C) 
=H(Y n \C) + H{X n \C,Y n ), 

and thus 

H(Y n \C) = H(X n \C) + H(Y n \X n , C) - H(X n \C, Y n ). 
Therefore, when R < I (X;Y), 

lim -H{Y n \C) = lim -H{X n \C) + lim -H{Y n \X n ,C) - lim -if(X n |C,F n ) 

n-+oo fl n— >oo 77, n^oo 77 n^oo 77 

=P + lim -H{Y n \X n ,C) (27) 
=i2+ lim -H(Y n \X n ) (28) 

n^oo 77 

=R+ lim i[#(X n ,F n ) -#(X n )] 

n^oo 77 

=R + H (X,Y)-H (X) (29) 
=i2 + flo01*), 

where (ED follows from Lemma O and O (EHJ) follows from the fact that C — > X™ — > K n 
forms a Markov Chain, and (|29l) follows from Lemma I6T1 This completes the proof of Theorem 

O ■ 

Proof: [Proof of Lemma 16.211 To prove Lemma 16.21 we begin with Fano's Inequality (see 
Theorem 2.11.1 in [4]): 

Let P e = Pr(g(Y) ^ X), where g is any function of Y. Then 

1 + P e \og\X\> H(X\Y). (30) 

For the channel (X,p(y\x),y) with a codebook C, we estimate the message index W from 
Y n . Let the estimate be W = g(Y n ) and P e (n) (C) = Pr(W ^ g(Y n )\C). Then, applying Fano's 
Inequality, we have 

H{W\Y n ,C) < 1 + P e (n) (C) log 2 nR = 1 + P^ n \C)nR. 
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Since given C, X n is a function of W, say X n = X n (W), we have 
H(X n \Y n ,C) < H{W\Y n ,C) < l + P^ n) {C)nR. 

Then, 

H{X n \Y n ,C) = ^p{C)H{W\Y n ,C) < ^2p{C){l + P^ n) {C)nR). 
c c 

Recall the channel coding theorem, which states that if we randomly generate the codebook 
according to Po(x), then when R < I (X; Y), 

X>(C)Pj B >(C)->0. (31) 

c 

Therefore, when R < I (X;Y), 

limsup-i/(X n |F n ,C) <limsup-^p(C)[l + P^ n) {C)nR} 

n> n—*oo n> £ 

lim sup - [1 + nR V p{C)P^ n) {C)\ 

n— >oo Tl 

lim sup - + lim sup R V p(C) P P (n) (C) 

_ n _ ^— i 



n 

=0. 



Furthermore, it is obvious that -H \X n \Y n , C) > and hence 

-H(X n \C,Y n ) -> 0, as n -> oo, 



n 



when i? < J (X;F). ■ 
Proo/: [Proof of Lemma Q Given any C, X n is a function of W 7 . Thus, #(X n |C) < 
H(W\C) = nR, and 

-#(X n |C) =- p(C)H(X n \C) < R. (32) 
n n 

c 

Therefore, to show Lemma [6731 it suffices to show that Hindoo -H(X n \C) > R. For this 
purpose, we first define a class of codebooks as regular codebooks and focus on characterizing 
H(X n \C) for a regular codebook C. Then, we show that a regular codebook appears with high 
probability when we randomly generate the codebook, and conclude that lim^oo ^H(X n \C) > 
R. 
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We say a codebook C is regular if 

sup 



N{x n \C) 



2 nR 



— p(x r 



< 



n 3 R 



where N(x n \C) is the number of occurrences of x n in C, defined by 

N(x n \C) = J^K^H = x n ), 



w=l 



and p(x n ) = Pr(X n = x n \X n G A^q(X)) where X n is drawn i.i.d. according to pq{ 
Given a regular C, for any x n G A^(X), we have 

N(x n \C) <2 nR p(x n ) + n 3 R 

=2 nR Pr(X n = x n \X n G A^(X)) + n 3 R 
<2 nR (l + o(l))2- n(//o(x) - £,) + n 3 R 
=n 3 R + o(l), 



x 



(33) 
(34) 



where the e' in (1331) goes to as e — > and (1341) follows from the general assumption that 
i? < H (X). Note that the message index W is uniformly distributed, we have for a given C 
and any x n G ^(X), 



N{x n \C) 
n 3 i? + o(l) 
_.0-n(R-e") 



< 



where e" goes to as n — > oo. Therefore, 



p(x n |C) 



> ^ p(:r"|C)log2 



n(-R-e") 



:[n(i?-e")] £ ^l C ) 
=[n(i2-01- 
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Below, We use the Vapnik-Chervonenkis Theorem to show that a regular codebook appears 
with high probability. 

Let B = {{x n },x n e A [ $(X)}. Since \B\ = \A ( ${X)\ < 2<- ff °W+ e ), for any A C X n , 

\{{x n } n A : x n G A§j(X)}\ < 2 n(Ho(y)+e) , 

and hence VC-d(B) < n(H (X) + e). 

Since VC-d(£>) is finite for a fixed n, we employ the Vapnik-Chervonenkis Theorem under 
the range space (X n ,B). To satisfy CEE), let both e and S in be where A e : = 

max{8VC-d(£>), 16e}. Then the Vapnik-Chervonenkis Theorem states that 



Pr { sup 

A f nR 



N(x n \C) n 
- p[x 



2?iR 



A e nR 
— 2 nR 



>1 



2 nR 

— >1 as n — > oo. (35) 

Since > f or sufficiently large n, (1351 ) concludes that Pr(C is regular) — > 1 as n — > oo. 
Therefore, 

#pr n |C) =^j9(c)i/(x n |c) 

c 

> £ p(C)ff(X»|C) 

C is regular 

>Ki?-e")] £ P (C) 

C is regular 

=[n(fl-0](l-o(l)), 

and 

lim -H(X n \C) > lim -[n(R - e")](l - o(l)) 

n— »oo ?T, n— >oo 77, 

= lim(i?-e")(l-o(l)) 

= R. (36) 
Combining ([32]) and (l36l) . we finish the proof of Lemma [6731 ■ 
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VII. Rate Needed to Compress Relay's Observation 

To study the optimality of the compress-and-forward strategy, in this section, we investigate 
the rate needed for the relay to losslessly compress its observation. In the classical approach of 
[2], the compression scheme at the relay was only based on the distribution used for generating 
the codebook at the source, without being specific on the codebook generated. However, since 
both the relay and destination have the knowledge of the exact codebook used at the source, it 
is natural to ask whether it is beneficial for the relay to compress its observation based on this 
codebook information. This question motivates us to compare the rates needed to compress the 
relay's observation in two different scenarios: when the relay uses the knowledge of the source's 
codebook and when the relay simply ignores this knowledge. 

Specifically, we consider the two compression problems shown in Figure CD where Y n is 
generated from X n through the channel (X ,p(y\x),y), and C in (b) is the source's codebook 
information available to both the encoder and decoder. Interestingly, we will show that to perfectly 
recover Y n , the minimum required rates in both scenarios are the same when the rate R associated 
with C is greater than the channel capacity. 



r 


Encoder 




Decoder 


Y" 




— ► 


► 



(a) Compression without using 
source's codebook information 



Y" 


Encoder 


R 2 


Decoder 






— ► 


— ► 



(b) Compression using 
source's codebook information 

Fig. 1. Two scenarios where the relay compresses its observation. 

Formally, we have the following theorem: 

Theorem 7.1: For the discrete memoryless channel (X,p(y\x),y), generate the codebook at 
random according to po(x) and reserve only the e-strongly typical codewords. Let C be the 
source's codebook with rate R, and X n and Y n be the input and output of the channel respectively. 
When R > Iq(X; Y), to compress the channel output Y n , we have 
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1) Y n can be encoded at rate R\ and recovered with arbitrarily low probability of error if 
Ri > H (Y). 

2) Given that the source's codebook information C is available to both the encoder and decoder 
and Y n is encoded at rate R 2 , the decoding probability of error will be bounded away from zero 
if R 2 < H (Y), which implies that we cannot compress the channel output better even if the 
source's codebook information is employed. 

To show Theorem 17.11 we need the following lemma. 

Lemma 7.1: For the compression problem in Figure [1Kb), we can encode Y n at rate R 2 and 

(n) 

recover it with the probability of error PT -> only if 

R 2 > lim -H{Y n \C). (37) 

n— >oo fl 

Proof: [Proof of Lemma 17.111 The source code for Figure [U-(b) consists of an encoder 
mapping f(Y n , C) and a decoder mapping g(f(Y n , C), C). Let I = f(Y n ,C), then P e (n) = 
Pr(g(J, C) 7^ Y n ). By Fano's Inequality, for any source code with P e — ► 0, we have 

H(Y n \I, C) < P e (n) log \y n \ + 1 = P e (n) n log \y\ + 1 = ne n , (38) 

where e n — > as n — > oo. 

Therefore, for any source code with rate R 2 and Pj n ^ — >■ 0, we have the following chain of 
inequalities 

nR 2 > H(I) (39) 

> H(I\C) 

= H{Y n ,I\C) -H(Y n \I,C) 

= H{Y n \C) + H{I\Y n , C) - H{Y n \I, C) 

= H(Y n \C)-H(Y n \I,C) (40) 

> H(Y n \C)-ne n (41) 

where d39l) follows from the fact that I e {1, 2, ... , 2 nR ' 2 }, dH follows from the fact that J is a 
function of Y n and C, and (14TT) follows from (1381) . Dividing the inequality nR 2 > H(Y n \C)—ne n 
by n and taking the limit as n — ► oo, we establish Lemma 17711 ■ 



Proof: [Proof of Theorem 17.111 
Proof of Part 1): To show Part 1), we only need to show that the sequence Y n satisfies the 
Asymptotic Equipartition Property, i.e., Pr(Y n G A^q(Y)) — > 1, as n — > oo. Then, following the 
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classical approach to show the source coding theorem, we can conclude that the rate R\ > H (Y) 
is achievable. By Lemma [67TL Pr((X n ,Y n ) G A^q(X, Y)) —>■ 1 as n —>■ oo. Thus, the sequence 
Y n satisfies the Asymptotic Equipartition Property and the rate R 1 > H (Y) is achievable. 



Proof of Part 2): By Lemma 17711 given that the codebook information C is available to both the 
encoder and decoder and Y n is encoded at rate R 2 , Pe^ — > only if R 2 > Hindoo ^H(Y n \C). 
By Theorem [331 lim,^ ±H(Y n \C) = H (Y) when R > I (X;Y). Therefore, when R > 
I (X; Y), P e (n) -> only if R 2 > H (Y), which establishes Part 2). ■ 

Appendix I 
Proof of Lemma [67T1 

Proof of Part 1): Let X n be drawn i.i.d. according to po{x) and Y n be generated from X n 
through the channel (X,p(y\x), y). Then, we have 

Pv((X n ,Y n )eA^(X,Y)) 

£ P(x n )p(y n \x n ) 

(x",y™)eA<$(X,Y) 

Pv(X n = x n \X n eA [ ^(X))-Px(Y n = y n \X n = x n ) 

(x",y")£A^(X,Y) 

£ Pr (^ n = x n \X n e A§j(X)) • Pr(Y n = y n \X n = x 11 ) 
(x»,y n )eA[ n ^(x,Y) 



£ Pr((X",F n ) = (x n ,y n )\X n G A^(X)) 



{x n ,y n )eA^{X,Y) 



-Pr((X n ,Y n ) G A^(X,Y)\X n G A^(X)) 
Pr((X n ,Y n ) G Aty(X,Y)) 



Pr(X«G4; } (X)) 
— >1, as n — > oo. 

Proof of Part 2): Denote the e-weakly typical sets with respect to po(x), po(y) and Po(x, y) by 
WJJW. W el\ Y ) and wftJP^) respectively. Along the same line as in the proof of part 1), 
we can prove that Pr((X n ,Y n ) G W${X,Y)) -> 1 as n -> oo, and hence Pr(X™ G wJjfCX")) 
and Pr(F" G WjJC^)) both go to 1 as n ^ oo. 
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Now, consider H(Y n ). We have 

ir(n=£p(v B )iog-i 



p(y n ) 

E M**^* E *»*»<* ^ 

w"6w e { i o , (i r ) 



= : 01 + 02 



For 0i, we have 



*.= E m**^ 

y n eW { J(Y) 
< P(y")log2 n(Ho(y)+£) 

=n(H (Y) + e)Pr(F n e wg } (y)) 
=n(ff (y) + e)(l-o(l)), 

where the inequality follows from the fact that p(y n ) > 2~ n ( i? °( y ) +e ) for any y n e H^o^(y). 
For 02, we have 



02 = £ Ml/") log 



1 



P(l/ n ) 



< - 



\ 



log 



= - Pr (y^w^(y))iog 



|^ )c (y)| 
pr(y^<?(y)) 



(42) 



I^S )c (y)| 



= _ Pr (r» ^ w^ } (y)) io g Pr(y" i w$(y)) + Pr(y™ i w^{y)) log |w5 )c (y)| 

=o(l) + Pr(y" i W§\Y)) log \W$ C (Y)\ (43) 

<o(i)+pr(y"^^ ) (y))iog|yr 

=o(l) + n • Pr(y" £ wjjfty)) log |3>| 

=n-o(l). (44) 
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(l42j) follows from the the log sum inequality (see Theorem 2.7.1 in [4]), which states that for 
non-negative numbers, ai, a 2 , . . . , a n and bi, b 2 , . . . , b n , 

n 

a. 



S 0,log ?-iS 0i ) log &f 



i=l ' \t=l 

with equality if and only if |f are equal for all z. 

(]43l) and (O both follow from the fact that Pr(Y" G WjJfQO) ^ 1 as n ^ oo. 
Therefore, 

H(Y n ) =0x + 2 

<n(if (y) + e)(l-o(l)) + n.o(l) 

=n(fT (^) + e)(l-o(l)). (45) 



Similarly, we have 



p(y r 



> 



(Ho(Y)-e) 



=n(H (Y) - e)Pr(F" e Wg } (Y)) 

=n(H (Y)-e)(l-o(l)). (46) 

Combining (g) and ®, we have lim^oo ^#(Y n ) = i? (^)- 

Along the same line as above, we can also prove that lim n ^oo ^H(X n ) = Hq(X) and 
lim n _ ! . 00 ^H(X n , Y n ) = H (X, Y), which concludes the proof of Lemma [6TT] 
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